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Abstract 

This paper constructs an ensemble-based sampling smoother for four¬ 
dimensional data assimilation using a Hybrid/Hamiltonian Monte-Carlo 
approach. The smoother samples efficiently from the posterior proba¬ 
bility density of the solution at the initial time. Unlike the well-known 
ensemble Kalman smoother, which is optimal only in the linear Gaussian 
case, the proposed methodology naturally accommodates non-Gaussian 
errors and non-linear model dynamics and observation operators. Unlike 
the four-dimensional variational method, which only finds a mode of the 
posterior distribution, the smoother provides an estimate of the posterior 
uncertainty. One can use the ensemble mean as the minimum variance 
estimate of the state, or can use the ensemble in conjunction with the 
variational approach to estimate the background errors for subsequent as¬ 
similation windows. Numerical results demonstrate the advantages of the 
proposed method compared to the traditional variational and ensemble- 
based smoothing methods. 
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1 Introduction 

Data assimilation (DA) is the process of combining information from predictions 
made by imperfect models, from noisy observations, and from priors to produce a 
consistent description of the state of a dynamical system. The application of DA 
to large scale systems such as the atmosphere is of great practical interest. Two 
approaches for solving large DA problems have gained widespread acceptance. 
The hrst approach, originating from control theory, are variational methods such 
as the three-dimensional variational (3D-Var) and four-dimensional variational 
(4D-Var) strategies |35j . The variational methods find a maximum a-posteriori 
(MAP) estimate of the true state of the system. The second approach are 
the ensemble-based statistical estimation methods. The most successful family 
of ensemble data assimilation algorithms includes the ensemble Kalman filter 
(EnKF) [in] and its variants, the square-root Kalman filters [12], the ensemble 
adjustment Kalman filter |6], the ensemble transform Kalman filter and 
efficient implementations of Kalman Filter, such as |4], using the Sherman- 
Morrison formula. All variants of FnKF provide a minimum variance estimate 
of the state by approximating the expected value of the posterior distribution. 
Variational and statistical estimation methods yield identical estimates (only) 
in the case of linear dynamics, linear observations, and Gaussian errors. 

Both 3D-Var and FnKF are filtering methods that estimate the true state 
of the system at the specific time instances where observations are available. 
For many oceanographic, meteorological, and hydrological applications, it is 
advantageous to employ smoothing methods such as 4D-Var and FnKS that 
simultaneously use information from all observations available at different time 
points within an assimilation time window. Strong-constraint 4D-Var updates 
the state of the system at the initial time of an assimilation window given a 
background estimate of the initial condition and a set of observations distributed 
through this interval. The ensemble Kalman smoother, on the other hand, 
estimates the posterior distributions of the state at time points in the window 
given all past, present, and future observations (in the assimilation window). 

4D-Var requires the derivation and implementation of the tangent linear 
and the adjoint numerical models, a challenging and effort-intensive task for 
large-scale models. The 4D-Var algorithm provides a single analysis state, the 
best posterior estimate of the state of the system. The uncertainty in the es¬ 
timated state is not inherently provided by the 4D-Var algorithm [14]. Pre¬ 
vious work proposed to quantify the uncertainty in the 4D-Var analysis, by 
approximating the analysis error covariance matrix using an ensemble of simu¬ 
lations [211121] ■ These schemes provide an approximation of the analysis error 
covariance matrix that is inconsistent with the 4D-Var analysis itself because 
the covariance estimates are usually obtained from independent schemes such as 
EnKF. Approaches to quantify 4D-Var analysis uncertainty based on subspace 
error decompositions [nisi] are statistically consistent but require additional 
computational effort. 

The FnKS is optimal when the observation operators are linear and the 
errors are Gaussian. However, these assumptions are unlikely to hold for real 
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applications. The analysis ensemble generated by the EnKS allows to find a 
minimum variance estimate (e.g., ensemble mean), as well as a measure of the 
analysis uncertainty (e.g., the analysis error covariance matrix). When the 
posterior distribution is nearly Gaussian EnKS offers an efficient practical algo¬ 
rithm. However, when the observation operators are nonlinear and the errors 
are non-Gaussian, the EnKS is not expected to yield good results. 

The Markov Chain Monte Carlo (MCMC) family of algorithms provides 
a powerful foundation to sample from complicated probability distributions. 
These algorithms work in general by generating a Markov chain whose sta¬ 
tionary distribution is the target probability distribution. MCMC sampling is 
considered to be the gold standard |26j in data assimilation. The main practical 
limitation of MCMC is the considerable computational cost required to achieve 
convergence, and to explore the entire state space in the case of high dimensional 
state spaces. Scalable and accelerated MCMC algorithms are being continuously 
developed to improve convergence and space exploration. Hybrid Monte Carlo 
(HMC), also known as Hamiltonian Monte Carlo, is an accelerated MCMC sam¬ 
pling algorithm that reduces the correlation between successive states by using 
Hamiltonian dynamics to generate proposal states m- Moreover, HMC tar¬ 
gets states with high acceptance probability leading to fast convergence and fast 
space exploration. To the best of our knowledge, HMC was first considered in 
the context of DA in [5] to solve a nonlinear inverse problem by minimizing the 
residual between the solution and ill-posed boundary conditions. Posterior error 
statistics are approximated by sampling the nearby states to the optimal state 
after convergence. In [8] a simulated annealing strategy is used during the sam¬ 
pling process where the minimum is obtained at a low temperature and posterior 
samples are collected at high temperatures. Solving the weak-constraint 4D-Var 
problem using a gradient method then using HMC to estimate the analysis error 
statistics is also discussed in [33 Chapter 6]. 

This work develops a nonlinear non-Gaussian smoother to solve the four di¬ 
mensional data assimilation problem. The new method uses a Hybrid Monte 
Carlo approach to sample the posterior distribution and is named the HMC 
smoother. This work extends the sampling filter, proposed in |7], to the four 
dimensional case where time-distributed observations are assimilated at once. 
HMC smoother provides an ensemble of states sampled from the posterior dis¬ 
tribution of the state of the system at the initial time of the assimilation win¬ 
dow. In a practical setting the smoothing step is carried out sequentially over 
consecutive assimilation windows. The generated ensemble encapsulates the un¬ 
certainty in the posterior distribution at the beginning of the current window; 
when propagated to the beginning of the next assimilation window it provides 
flow-dependent information about the background error distribution in the next 
assimilation cycle. 

The use of HMC for solving smoothing problems was presented also in |3 , 
where a generalized version of HMC is used in an attempt to reducing the 
number of chain steps required to ensure independence of the generated states. 
The underlying dynamical system in the generalized version of HMC is not 
Hamiltonian. There are several important differences between this work and |5]- 
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In the only source of uncertainty is model error, in form of additive random 
noise included in the dynamics. Here we work in the strong constraint 4D-Var 
framework and consider the model to be perfect; the system state is uncertain 
due to uncertainties in the initial conditions. While in numerical experiments 
are carried out with a one-dimensional system, here we experiment with shallow 
water equations over the sphere, a moderately large multidimensional nonlinear 
model relevant for geophysical applications. Finally, we propose to use higher 
order symplectic integrators, as tested in [7], to efficiently sample from complex 
posterior distributions that arise when the observation operators and models 
are highly nonlinear. 

The remaining part of the paper is organized as follows. Section reviews 
the variational and the ensemble approaches for solving the data assimilation 
problem. The HMC smoother is presented in Section Experimental set¬ 
tings and numerical results are discusses in Section]^ Conclusions and future 
directions are summarized in Section [5] 


2 Data Assimilation 

This section reviews the 4D-Var and the EnKS data assimilation schemes. 


2.1 Four-dimensional variational data assimilation 


4D-Var calculates the optimal initial condition for the state of the dynamical 
system over a specific assimilation time window, based on a background state 
and using all observations available within this time window [34) . The back¬ 
ground initial state is usually the forecast produced by propagating the previous 
window analysis through the model dynamics. To be specific, let the current 
assimilation window be the time interval [to, tp]. Given a background state 
Xq = x^[to], and a set of observations {y^, = y[tfc]}fe=o,i....,NobB; available at the 
discrete time points {tfe}fe=o,i,...,Nobs C [to Af], the 4D-Var analysis is obtained 
by solving the following optimization problem: 


J- 2 

min J(xo) = - llxo -Xo|| i 

Xo z ^0 

^ Nobs 


( 1 ) 


Here Rk is the observation operator (generally nonlinear) that maps the model 
space into the observation space at time point t^. The dimension of observation 
space m is usually much lower than the dimension of the state space, that is 
m <§; Nvar- Bo is the background error covariance matrix, and Rfc’s are the 
observation error covariance matrices at each times tk', k = l,...,Nobs- The 
background error covariance matrix determines how information from observed 
areas are extrapolated to unobserved regions or where observations are sparsely 
available [4T]. The state x^ = x[tfc], is produced by propagating the initial state 
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Xq through the model dynamics from time to to point tk 

Xfc = A4o,fe(xo). (2) 

The model solution operator Ad represents the discretized partial differential 
equations that govern the evolution of the dynamical system. Realistic atmo¬ 
spheric and ocean models typically have Nvar ~ 10® — 10® state variables. 

In this work we consider the strong-constraint 4D-Var case which assumes 
that the numerical model ([^ is perfect. The methodology proposed here can 
be immediately extended to the weak-constraint 4D-Var framework |38j , where 
model errors are accounted for by adding the corresponding model error terms 
to equation Q [JT]. 

Perturbations (small errors 6x ) of the state of the system evolve according 
to the tangent linear model: 


SXk = Mo,/c(xo) -(5X0, to<t<tF, 


(3) 


where Mo,fe = dMo,k/dx[to] is the Jacobian of the model solution operator. 

In strong-constraint 4D-Var the model dynamics act as constraints to the 
optimization problem (2). The optimal initial condition obtained by solving the 
optimization problem (1) constrained by the model dynamics ([^ , is referred to 
as the analysis state xg = x^[to]- The gradient of the cost functional Q is: 


Vx„J(xo)=B-i(xo-x®) (4) 


where Mq ^ is the adjoint of the tangent linear model operator (§, Hfc = 
cl'Hfc/9xf. is the Jacobian of the observation operator "Hfe, and is its adjoint. 
Gradient-based minimization using Q requires the development of the tangent 
linear and the adjoint models, which is a challenging task for high-dimensional 
complex models of practical interest. The performance of the optimization can 
be improved by using the second order derivative information and adaptive 
observations as described in [3]. 


2.2 Bayesian interpretation of 4D-Var 

The knowledge of the system state at the initial time to prior to obtaining new 
observations is described by the background (prior) probability density 7^®(xo). 
The “sampling model” gives the probability distribution of observations condi¬ 
tioned by the initial state 

’^(yoj yi, ■ • •, yNobJxo), under the belief that the dynamical model Xfc = Ado.fc(xo) 
perfectly represents reality. From Bayes’ theorem: 

V’^ixo) =T’(xo|yo,yi,...,yN„bJ 

^ 'Pjyo^y !,■■■,yNobsl^o) v^i^o) 

'^(yo,yi,---,yNobJ 


(5a) 

(5b) 
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The posterior (analysis) PDF V‘^{xq) is the probability distribution of the initial 
state after incorporating the new knowledge contained in the observations. The 
denominator in ( |5b[ ) is the marginal density of the observations and acts as a 
normalization factor. 

The background and observations errors are usually assumed to have Gaus¬ 
sian distributions: 

T’'"(xo) oc exp ||xo - x|(||3_i^ , (6a) 

T’(yfclxfe) oc exp ||Hfe(xfc) - (6b) 

where Bq is the background error covariance matrix and Rfc’s are the observation 
error covariance matrices at times fc = 1,..., Nobs- If the observation errors 
at different time points are independent, and the model is perfect, the joint 
sampling model can be written as 

'P(yo,yi,---,yN„bJxo) oc 



Bayes’ rule ([^ yields the following posterior PDF 

T”"(xo) oc exp J'(xo)^, (7a) 

-1 -I Nobs 

= 2 11^0 - Xollg-i + -J2 ll'Wfe(xfc) - y/cIlR-i ■ (7b) 

For nonlinear models and nonlinear observation operators the posterior Q is 
not Gaussian. The kernel of the posterior is exp(—J'(xo)), where J'(xo) is the 
cost functional of the 4D-Var problem. 4D-Var computes the analysis Xq as the 
minimizer of J7. The 4D-Var solution is the MAP estimate of the initial state 
under the assumptions that the background and observation errors are Gaussian. 
For highly nonlinear observation operators and highly nonlinear models the 
posterior can have multiple modes. In this case the 4D-Var numerical solution 
can be trapped in a local minimum of the cost functional. 

The posterior distribution contains the complete characterization of the 
uncertain initial state of the dynamical system. However, calculating the full 
posterior with high dimensional models is infeasible in practice. A practical 
approach is to describe the posterior probability density by an ensemble of 
states, and to use it to estimate moments of the distribution. Sampling directly 
from the posterior PDF of the initial condition 0 acts as a smoother; the 
ensemble mean provides an estimate of the true state of the system, and the 
ensemble covariance an estimate of the posterior uncertainty that is totally 
consistent with the analysis state. In contrast, the current practice to use the 
analysis obtained from 4D-Var and the covariance obtained from EnKF or EnKS 
leads to inconsistent representations of uncertainty. 
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2.3 Ensemble Kalman filter and smoother 

Filtering is the process of calculating the posterior distribution of the uncertain 
state of a dynamical system at a specific time given observations only available 
at that time instance. The ensemble Kalman filter m represents probability 
distributions by samples. Let {x5l(e)}e=i,2,...,Ne„B be an ensemble of forecast 
states at time tk, and yt = y[tk] the observation vector (set of measurements) 
at tfc. If the forecast (background) ensemble is represented by the matrix X^, 
whose columns are the forecast ensemble members, then the updated (analysis) 
ensemble matrix at the same time tk is obtained as [20] 

= X^ Tfe , (8) 

where the matrix G ]^NensXNe„s jg nonlinear transformation constructed 
from the forecast ensemble and the observations at time tk [5D]. Square-root 
filters [42] , the ensemble transform Kalman filter m, and the ensemble adjust¬ 
ment Kalman filter [6| can all be written in the form § for specific choices of 
the transformation T^. 

Smoothing is the process of calculating the posterior distribution of the 
uncertain states of a dynamical system given past, present, and future observa¬ 
tions |13j . There are three approaches to smoothing: fixed-point, fixed-interval, 
and fixed-lag smoothing |18j . with the second and third ones being the most 
popular |33|. Any scheme that can be used to solve any of the three smoothing 
problems can also be employed as a single fixed-interval smoothing scheme [T3] . 
The ensemble smoother (ES) was introduced in [21j as a linear variance mini¬ 
mization algorithm. The ensemble Kalman smoother (EnKS) employs an 
ensemble of states to describe distributions and obtains the posterior using the 
Kalman filter updates equations. EnKS is optimal in case of linear dynamics, 
Gaussian errors, and large number of ensemble members |23j . 

To construct EnKS [22] the EnKE update equations ^ are used repeatedly 
to develop a fixed-lag, and a fixed-interval smoothing algorithms. A fixed-point 
smoother can be written as [20] 


X^ = Xj; • f] Tfe . (9) 

k^O 

The update equation © is used recursively for fixed-interval smoothing, where 
smoothed ensembles are obtained at specified set of times, and they are con¬ 
ditioned only on observations available at later times in the interval. Ravela 
and McLaughlin [32] presented efficient, fast versions of the fixed-interval and 
the fixed-lag EnKS. The fast fixed-interval smoother has a computational cost 
that scales linearly with respect to the length of the interval. In this work, we 
use the fast fixed-interval EnKS [33] , with a single smoothing point (fixed-point 
smoother) chosen at the beginning of the time interval. 

EnKS computes the minimum variance estimate of the state. This is not 
expected to be very accurate if the observations are highly nonlinear or if the 
Gaussianity assumptions are severely violated. As shown in Section]^ the HMG 
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sampling smoother proposed herein is capable of handling nonlinear observation 
and model operators, and consequently produces posterior estimates that are 
more useful than the EnKS ensemble, and contain more information than the 
4D-Var MAP analysis. Hybrid methods such as [3T] make use of optimization 
over ensembles using the trust-region framework. 


3 The hybrid Monte-Carlo sampling smoother 

The most popular and successful class of sampling algorithms is the Markov 
chain Monte-Carlo (MCMC) first introduced by Metropolis et al. [27] . 
MCMC algorithms sample from a general probability distribution 7^ (x) by build¬ 
ing a Markov chain whose invariant distribution is ’P(x). MCMC algorithms 
have an advantage of not requiring the normalization of target distributions. 
However, traditional MCMC samplers are often considered impractical for large 
dimensional problems due to the following drawbacks: The Markov chain may 
take a very long time to reach stationarity. A large number of (burn-in) states 
are generated and discarded before starting the sampling process, in order to 
guarantee that the collected samples are obtained from the true target PDF. The 
samples should be independent, however the Markov chain is not completely 
memoryless; in order to achieve independence of sampled states, the sampler 
usually drops some intermediate states between each selected state. Another 
drawback of most of Monte-Carlo sampling methods is the curse of dimension¬ 
ality [22]: as the dimension of the state spaces grows, the number of sample 
members needed to represent the probability distribution, grows rapidly. The 
number of samples required to efficiently represent the probability distribution 
can be controlled if the sampler surveys sufficiently fast the entire state space. 
The sampler can become trapped in a high-probability mode of a multi-modal 
distribution, and fail to represent the other probability modes. 


3.1 Hybrid Monte-Carlo 

Hybrid/Hamiltonian Monte-Carlo (HMC) [IB] follows an auxiliary-variable ap¬ 
proach in order to alleviate the limitations of the traditional MCMC algorithms. 
The phase space of a Hamiltonian dynamical system consists of points (p, x) G 
where x G is the position variable, and p G R”''" is the momen¬ 

tum variable. The Hamiltonian dynamics is governed by the set of ordinary 
differential equations (ODEs): 


-=VpH(p,x), 

^ = -VxH(p,x), 

where the Hamiltonian function H describes the total energy of the system 
H(p,x) = ^p^M-ip + ^(x). 


(10a) 


(10b) 
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The first term of the Hamiltonian (10b I represents the potential energy of the 
system, while the second term corresponds to the kinetic energy. The exact 
(analytic) flow of the Hamiltonian system (10a I advances the solution in time 
from t = 0 to t = T: 


$7' 


ii 2 Nv 


ii2Nv 


<i>T(p[0],x[0]) = (p[r],x[T]). 


( 11 ) 


This flow cannot be calculated exactly in practice, and has to be approximated 
by an equivalent numerical solution using a time reversible symplectic integra¬ 
tor |36[ I37j . The most common symplectic integrator is leapfrog (Stdrmer- 
Verlet) [5S1I3Z]. One step of the position Verlet algorithm advances the solution 
of the Hamiltonian equations (10a) from time tj to time tj+i = tj + h using: 


Xj + l/2 

Pj + 1 

^j + 1 


-M- 1 ’ 
2 


'■j 2 1^1 ’ 

Pj - /lVxJ(Xj + i/2) 


h . 
2 


Xj + l/2+^M ^Pj + i- 


(12a) 

(12b) 

(12c) 


The optimal time step size h must satisfy h oc (1/Nvar)^^^ [2], and careful 

empirical tuning of the step size is usually required for good performance [7]. 
Several other symplectic integrators with more stages and higher accuracy than 
Verlet have also been developed m- An infinite dimensional time integrator 
was also introduced in m- 

For practical considerations it is advisable to split the interval [0, T] where 
the Hamiltonian system evolves into m smaller sub steps of length h = Tjm. 
The flow of the numerical solution obtained by the symplectic integrator will be 
denoted by $7 and is an approximation of the exact flow $ 7 . 

The key idea in HMC sampling is to add an auxiliary variable p to the 
target variable x and sample from the joint probability distribution of (x, p). 
The auxiliary variable is chosen such that the sampling procedure from the joint 
distribution is much faster than sampling from the marginal distribution of the 
target variable. In HMC sampling the target and the auxiliary variables are 
thought of as the position and momentum components of a Hamiltonian system, 
respectively. The Hamiltonian dynamics of the system serves as a transition 
kernel to the Markov chain. 

The kernel of the stationary probability distribution of the Hamiltonian sys¬ 


tem ( 10 ) is 


exp(-H(p,x)) = exp ip-J^(x) 

= exp ^-ip^M'ip j • 7r(x), 


(13a) 

(13b) 


where 7r(x) = exp J7(x)j is the probability distribution of the position vari¬ 
able. The joint probability distribution of the state (p, x) in the phase space 
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]^2Nvar jg product of the marginal distributions of both the position and 
the momentum. This simply means that the two variables x and p are indepen¬ 
dent [36]. Independence of both position and momentum makes it possible to 
sample from the marginal distribution of each variable by sampling from their 
joint distribution. The marginal PDF of the momentum variable is a Gaussian 
distribution with zero mean and covariance matrix M (also known as the mass 
matrix), i.e., p ~ A/'(0, M). 

Let X ^ 7r(x) be a random variable that is the target of the MCMC sampling 
algorithm. View x as the position variable in the Hamiltonian system, and add 
the momentum p as an auxiliary variable. The symplectic integrator is used to 
propose a state that is either accepted or rejected using an acceptance/rejection 
rule based on the loss of energy. Algorithm [5^ summarizes the HMC steps to 
sample from the probability distribution 7r(x). The loss of energy between the 


Algorithm 1 HMCMC Sampling [36] . 

1: Initialize the Markov chain. Preferably (po, xq) should have high probability 
w.r.t. the target distribution. 

2: At each step k of the Markov chain draw the random auxiliary variable 
Pk ~ A/'(0,M). 

3 : Use a symplectic numerical integrator (e.g. position Verlet) to advance the 
current state (pfc,Xfe) by a time increment T to obtain a proposal state 
(p*,x*): 

(p*,x*) = $T((Pfe,Xfe)). (14) 


4 : 

5 : 


Use the Hamiltonian (10b I to approximate the loss of energy AH. 


Calculate the acceptance probability: 

aW =lAe-^^. 


(15) 


6: Discard both p* and pfc. 

7 : (Acceptance/Rejection) Draw a uniform random variable ~ kl{0, 1): 

i- If accept the proposal as the next sample: x^+i := x*; 

ii- If reject the proposal and continue with the current state: 

Xfc-i-i := ^k- 

8: Repeat steps 2 to 7 until sufficiently many distinct samples are drawn. 


current and the proposed state is usually calculated as the difference between 
the Hamiltonians at the current and the proposed states: 


AiL = i7(p*,x*)-H(pfc,Xfc). 


(16) 


This equation (16) is valid for the Verlet ( [I^ , two-stage, three-stage, and four- 
stage symplectic integrators [121ES] ■ See ^ for details on different symplectic 
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integrators and corresponding expressions for energy. The length of the Hamil¬ 
tonian trajectory T and the number of steps m are parameters to be tuned by 
the user [30]. Another user-tunable parameter is the mass matrix M, a sym¬ 
metric positive definite matrix that represents the covariance of the momentum 
variable. The choice of the mass matrix does not alter the fact that HMC 
sampling Algorithm converges to the stationary distribution 7r(x). However, 
a good choice of M can considerably improve sampling efficiency. One popu¬ 
lar and simple choice is to take M a constant multiple of the identity matrix. 
Ideally, if the variances of the target distribution 7r(x) are known (or can be 
approximated), the diagonal of M should be chosen as the corresponding preci¬ 
sions (reciprocals of these variances) |5^. We found that this choice results in 
a very fast convergence of the chain to stationarity. 

HMC sampling Algorithm tends to explore the state space faster than 
traditional MCMC, and the acceptance probability of all generated states is 
close to one. Several enhancements to the HMC sampling, such as parallel 
tempering UBiini, have been proposed to guarantee that the algorithm escapes 
local modes of high probability. 


3.2 Sampling smoother algorithm 

We now present the HMC smoother (smoothing by sampling) that simulta¬ 
neously accounts for all observations available within a specific assimilation 
window to obtain posterior estimates of the initial system state. 

Consider the assimilation window [tg, tp] with a set of observations available 
at times tg, ti,..., tNob,, inside the window, where tNobs = tp- Under the assump¬ 
tions discussed in Section 2.2 the posterior (analysis) probability distribution 
of the initial state Xg takes the form Q. We seek to sample from this poste¬ 
rior distribution using the HMC approach. For this we set the potential energy 
term in (10b) to be the 4D-Var cost functional (7b). Consequently the target 


probability distribution 7r(x) coincides with the 4D-Var posterior distribution 
i.e., 7r(x) = exp(-J'(x)). 

The smoother works sequentially over consecutive assimilation windows by 
applying the forecast and analysis (sampling) steps in succession. In the fore¬ 
cast step each state of the analysis ensemble is propagated in time to the end of 
the previous assimilation window (the beginning of the current window). The 
result of the forecast step is a forecast ensemble X*’ (or just a single background 
state Xq) at the beginning of the current time window, i.e. at tg. One can 
just propagate the analysis state (e.g. the mean of the analysis ensemble) to 
obtain the current background state Xg. However, propagating the full analysis 
ensemble makes it possible to build an ensemble-based (flow-dependent) back¬ 
ground error covariance matrix at the beginning of the current window. This 
background error covariance matrix includes the errors of the day, and can con¬ 
siderably enhance the quality of the analyses generated by a data assimilation 
scheme. We will assume herein that the full forecast ensemble is generated in 
the forecast step. In the analysis step, the HMC sampling strategy summarized 
in Algorithm 12 is applied to obtain the analysis ensemble at the initial time of 
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the current assimilation window. 

The HMC sampling smoother is detailed in Algorithm 


Algorithm 2 The Proposed Sampling Smoother 
1: Analysis step: Given the background state and observations, draw an 
ensemble of initial states from the posterior distribution 0 as follows: 

i- Calculate an ensemble-based forecast error covariance matrix B™®, 
and use it together with a fixed (modeled) matrix to construct the 
background error covariance matrix Bg [?]• (This step can be omitted 
by using the modeled background error covariance matrix; however, 
the use of forecast ensemble is expected to improve the quality of the 
generated analysis ensemble.) 

ii- Build the mass matrix M as a diagonal matrix such that diag (M) = 
diag(Bo^). 

iii- Initialize the Markov chain to the best estimate of the current state 
available, e.g., the background state Xg, or a suboptimal 4D-Var solu¬ 
tion. A good choice speeds up the convergence of the chain. 

iv- Follow the steps in Algorithm to generate the chain and select en¬ 
semble members after the chain reaches stationarity. Dropping a small 
number (5 to 10) steps between each selected states helps to ensure 
the independence of the generated ensemble members. 

2: Forecast step: Propagate each member of the analysis ensemble, using the 
full forward model, to the end of the current assimilation window (beginning 
the next assimilation window). 


The generated ensemble of states {xQ(e)}e=i, 2 ,...,Ne„B) samples the posterior 
PDF V‘^{xo), and can be used to calculate the best estimate of the initial con¬ 
dition of the system (e.g., the mean (xg) of the ensemble), and to estimate the 
analysis error covariance matrix Ag: 


= (Nens) ^ Xj)(e) , 


(17a) 


AXg = 
An = 


K(l)-xg,...,xi)(Ne„s)-xg], 
(Nen. - 1)-1 (AXO (AXg^)^) . 


(17b) 


The forecast and analysis steps are repeated sequentially on subsequent assimi¬ 
lation windows. The propagated ensemble can be used to estimate the analysis 


covariance at the final time using (17b), such as to provide “flow-dependent” 


information for the background error covariance matrix used in the subsequent 
assimilation interval. 
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4 Numerical Experiments 

The proposed HMC sampling smoother is tested against the EnKS and the 4D- 
Var schemes on two numerical models. We first illustrate the distinctive features 
of the HMC smoother with a simple one-dimensional model with a nonlinear 
observation operator and a bimodal posterior distribution. Next, we employ 
the shallow water on the sphere to test the sampling smoother on a problem 
relevant to geophysics, and to compare its performance against the conventional 
4D-Var scheme. 


4.1 A one-dimensional model 

Consider the following model: 

dx dV (x) 
dt dx ’ 

H(x) = (x+lf (x-lf , 


(18a) 

(18b) 


that describes the position of a particle over the entire real line moving under 
the effect of the potential held (18bI. This model is similar to the one used 
in 1^. The potential held has two local minima at ±1, which are expected 
to act as attractors for the particle. We set the reference initial condition to 


= —0.15, and chose the background initial condition to be 


= 0 . 1 . 


Note that the true and the background initial conditions lie in the basins of 
attraction of different equilibria. The background errors are assumed to be 
normally distributed with zero mean and standard deviation a^Q = V^. 

Synthetic observations are obtained from the reference solution by applying 
the quadratic observation operator 


'W(xfc) = (Xfc)^ . 


(19) 


Observation errors are assumed to be Gaussian with zero mean and standard 
deviation dobs = 0.05. The simulation time window is [toj ^f] = [0, 0.12] (units), 
with equally spaced 12 observation points. The posterior distribution of the 
initial state reads 


V^{xo) oc exp 


/_! / xq-O.I 
i 2 1.41 


2 


1 

2 


Nobs = 12 

E 


/c=l 



( 20 ) 


where x^ is obtained by propagating Xq forward in time, from to = 0 to 
tk = k X 0.01 (units), using the model (18). The non-normalized posterior 
density (20) is illustrated in Figure Traditional assimilation methods, like 
4D-Var and EnKS, are expected to have difficulties capturing the bimodal na¬ 
ture of the posterior distribution of the initial condition. Since the prior PDF is 
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Figure 1: The non-normalized kernel of the posterior distribution (20). The 
right peak is slightly higher than than the left one as a result of the prior 
being a Gaussian distribution centered around x*’ = 0.1 with small standard 
deviation (cjxo = V^) Given the current settings, the right peak occurs at Xq = 
0.103 with F’(xo) = 0.02775 while the left peak occurs at Xq = —0.103 with 
P(xo) = 0.02746. 


a Gaussian centered around the background state x'^ = 0.1 with standard devi¬ 
ation CTxo = V^, the right peak in Figure is slightly taller than the left peak. 
With Gaussian background prior centered around one of the peaks, smaller 
standard deviation would damp the other peak. Capturing only that right peak 
completely misses the true solution, which is negative. Numerical results pre¬ 
sented below show that the proposed HMC smoother is capable of building a 
representative ensemble from the bimodal posterior distribution. The analysis 
ensemble can then be used to draw more useful conclusions (e.g. statistics) than 
what can be obtained from analysis results obtained by the traditional methods. 


4.1.1 Numerical results with the one-dimensional model 


HMC smoothing was carried out to collect an ensemble of 100 members from the 


posterior (201. We tested several symplectic integrators [7] , and found that all 
show similar behavior. We chose the position Verlet symplectic integrator due 
to its minimal computational cost for all our experiments. The Hamiltonian 
system step size is empirically tuned to T = 0.1, with step length h = 0.01, 
and number of steps m = 10. The number of burn-in steps is chosen to be 20 
(for this simple model we already know that the forecast state 0.1 lies in the 
support of the posterior and the burn-in steps could be omitted; in general one 
can incorporate convergence tests to shorten the number of burn-in steps and 
ensure that the collected samples are from the target distribution). Four states 
are dropped between consecutive selected states at stationarity to guarantee the 
independence of the samples. 

The histograms of the analysis ensembles obtained with the HMC smoother 
and EnKS are shown in Figure The HMC smoother generates an analysis 
ensemble that matches the kernel shown in Figure [l] but EnKS fails to generate 
an accurate analysis ensemble. The most likely state seems to be located in 
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(b) Ensemble Kalman smoother. 


Figure 2: Histograms of the analysis ensembles generated by HMC smoother, 
and EnKS. The number of ensemble members generated by each smoother is 
100. For the HMC smoother the step of the symplectic integrator (12) is T = 0.1, 
with h = 0.01 and m = 10. 


the correct place. A single analysis state (best estimate) in this case might 
be misleading. One needs to consider more than one analysis with certain 
probability to give better description of the true state of the system in case of 
multi-modal systems. 

The 4D-Var algorithm is expected to be trapped in a local minimum of the 
posterior distribution. Since the background state is closer to -1-1 than to —1, 


and since the observations (19) are insensitive to the sign of solution, we expect 


the analysis to follow the behavior of the true solution but with the opposite 
sign. This is confirmed by results in Figure 
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Figure 3: Data assimilation results using 4D-Var, together with the forecast and 
reference trajectories plotted over the assimilation window. 


4.2 Shallow water model on the sphere 

The shallow water equations have been used extensively as a simplified model 
of the atmosphere that contains the essential wave propagation mechanisms 
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found in general circulation models (GCMs)[3H]. The shallow water equations 
in spherical coordinates are 


du 1 
dt a cos 9 


du ^du 

+vcos9 — 
dX dO 


- / + 


itan9 


g dh 
a cos 9 dX 


dv 

dt 


1 

a cos 9 


dv ^dv 

+ ncose* — 
dX d9 


+ 



utan9 


,gdh 
a d9 


dh 1 / d{hu) d{hv cos 9)\ 

dt a cos 9 \ 9A d9 ) 


(21a) 


(21b) 


(21c) 


Here / is the Coriolis parameter, given by / = 2Hsin0, where fl is the angular 
speed of the rotation of the Earth. In addition, h represents the height of the 
homogeneous atmosphere, u and v are the zonal and meridional wind compo¬ 
nents, respectively. The latitudinal and longitudinal directions are respectively 
denoted by 9 and A. The radius of the Earth is denoted by a and g is the 
acceleration due to gravity. The space discretization is performed using the 
unstaggered Turkel-Zwas scheme [5S]. The discretization has nlon=72 nodes 
in longitudinal direction and nlat=36 nodes in the latitudinal direction. The 
semi-discretization in space leads to a system of ordinary differential equations: 


x' = /(t,x), x(to) = xo; to = 0, tF = 9 (hours). (22) 


The vector x S K" with n = 3 x nlat x nlon contains discrete versions of the 
zonal wind, meridional wind, and the height variables. We perform the time 
integration using a S**' order Runge-Kutta method. This time-integrator is part 
of the MATLODE suite [T^ , which also has sensitivity analysis capabilities. 


4.2.1 Observations and background information 

A reference initial condition is used to generate a reference trajectory. Synthetic 
linear observations are created from the reference trajectory by adding Gaus¬ 
sian noise with zero mean for each of the three components. The observation 
error standard deviation for height component is set to 1.5% of the average 
magnitude of the reference height component in the reference initial condition. 
The observation error standard deviation for wind components is set to 10% 
of the average magnitude of the reference wind component in the initial condi¬ 
tion. The initial background state is created by perturbing the reference initial 
condition with a Gaussian error drawn from the distribution A/'(0, Bq), with a 
modeled background error covariance matrix. The background error covariance 
Bo is modeled as follows: 

• Start with a diagonal background error covariance matrix. The standard 
deviation of the background errors for the height component is 2% of 
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the average magnitude of the reference height component in the reference 
initial condition. The standard deviation of the background errors for the 
wind components is 15% of the average magnitude of the reference wind 
component in the reference initial condition. 

• Synthetic initial ensemble is created by adding zero-mean Gaussian noise 
to the reference initial condition with covariances set to the initial (diag¬ 
onal) background error covariance matrix. Apply the ensemble Kalman 
filter for 48 cycles with observations obtained each hour. The uncertainties 
in observations are fixed to 1.5%, and 10% for the height and wind com¬ 
ponents respectively. The synthetic observations are obtained by adding 
Gaussian noise with zero mean and standard deviation equal to the un¬ 
certainty level multiplied by the average magnitude of the corresponding 
component (height and wind) in the initial condition. 

• Decorrelate the ensemble-based covariances using a decorrelation matrix 
p with decorrelation distance L = 1000 km. 

• Galculate Bq by averaging the ensemble covariances over the last 6 hours 
with one matrix per hour. In this version the background noise levels are 
no longer 2% and 15%. 

This method of creating a synthetic initial background error covariance matrix 
is empirical, but we found that the resulting background error covariance matrix 
performs well for several algorithms including 4D-Var. Enhancing the quality 
of this background error covariance matrix can be done by making use of the 
ensembles generated by the sampling smoother, an idea that we will investigate 
in future work. 

Data assimilation experiments with this model were conducted for three 
consecutive assimilation windows. The time interval of the first assimilation 
window is [0,6] hours, the second window is [6,14] hours, and the third is 
[14, 20] hours. The short first window can be regarded as a spin-off period for 
the data assimilation system. Hourly (synthetic) observations are available each 
of the three windows, with a total of 6 observation times in the first window, 
and 8 observation times in each of the last two windows. 

Two experiments were conducted. In the Hrst one the background error 
covariance matrix Bq is kept fixed for each of the three windows. In the sec¬ 
ond experiment Bq is updated with information from the generated ensemble 
according to the following expression: 

ghybrid ^ ^ ^ gmodeled ^ , ( 23 ) 

where is the updated version of Bq, and is the fixed version 

used in the first experiment. The scalar weight 7 is a number in the interval 
[0,1]. Selecting 7 = 1 ignores the error-of-the-day, while 7 = 0 forces the use of 
only the flow-dependent background error covariance matrix obtained from the 
ensemble, possibly leading to a singular covariance matrix. In our experiments 
we chose 7 = 0.75. 
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The error metric used to compare analyses against the reference solution is 
the root mean squared error (RMSE): 


RMSE = 


\ 


£ (X,; - Xf 


(24) 




where x‘''“® is the reference state of the system. The RMSE is calculated hourly 
along the trajectory over each assimilation window. 


4.2.2 Numerical results with the shallow water on the sphere model 

The numerical optimization step in 4D-Var is carried out using the the LBFGS 
routine implemented in the Poblano optimization toolbox m- Here the opti¬ 
mization process is stopped when the norm of the gradient is le — 10 or when 
the relative function tolerance hits le — 6. The optimization process takes at 
least 45 iterations of LBFGS to converge for the experiment considered here; 
(see Table . 

The HMG sampling smoother is used to generate 100 ensemble members in 
each assimilation window. The symplectic integrator used is Verlet (12) with 
an empirically tuned step length of T* = 0.1, with /i* = 0.01, and m = 10. A 
practically useful recipe is to perturb the step length of the symplectic integra¬ 
tor, a procedure that guarantees that the results obtained are not contingent on 
that specific selection of step settings naED]. The step length h is perturbed 
with uniform random noise: h := {1 + r) x h^, r ^ U{—0.2, 0.2). It is important 
to notice that the step h is pertubed only once at the beginning of the Hamil¬ 
tonian trajectory and kept fixed for all the m steps. This actually means that 
the length of the Hamiltonian trajectory T is perturbed for each proposal state 
while keeping the number of steps m constant such that at each run the step 
size h scales accordingly 

The number of burn-in steps is set to 30. We noticed that the HMG smoother 
converges to the posterior in much fewer steps (5—10). Four generated states are 
discarded between each selected state in the ensemble to guarantee independence 
of the generated ensemble members. 
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Figure 4: Data assimilation results for two scenarios using linear observations 
are shown. The first panel shows RMSE with Bq being hxed. The second 
panel 4b shows RMSE with Bq being updated using (23). The symplectic 


integrator used in both cases is Verlet (12) with step T = 0.1, where h = 0.01 


and m = 10. The number of dropped states between selected samples is 4. 


The RMS errors for both HMC smoother and 4D-Var over the three as¬ 
similation windows are shown in Figure Figure reports the case where 
the background error covariance matrix Bq is kept hxed, and Figure [4b] shows 
the case where Bg is updated, at the beginning of each assimilation window, 
according to equation (23). The quality of the analyses in both cases is very 


similar, however in the second case a slight reduction in RMSE is noticed along 
the entire trajectory. This is appreciated by Figure [^zooming onto the RMSE 
results over the second assimilation window. 
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Figure 5: Same as results in Figure with only RMS errors over the second 
window displayed for two scenarios. RMS errors obtained while keeping Bq 
fixed are plotted as dotted lines. RMS errors obtained with Bq being updated 
at the beginning of the assimilation window are plotted as dotted lines. 

The HMC smoother can sample efficiently from the posterior distribution 
and resulting analysis competes in accuracy with that obtained using 4D-Var. 
Figure|^shows the three components at the beginning of the first window for the 
reference solution, the background state, 4D-Var analysis, and HMC smoother 
analysis (ensemble mean). Results shown in The analysis recovered from the 
noisy background by both 4D-Var and HMC smoother are almost identical. 

The assimilation results obtained over the next two windows with Bq kept 
fixed are shown in Figures and The performance of the two schemes, 4D- 
Var, and the HMC smoother is quite similar, and the HMC smoother analysis 
competes with the 4D-Var analysis. 

Updating the background error covariance can, in principle, enhance the 
performance of both the 4D-Var and the HMC smoother. In the case of 4D-Var, 
updating Bq results in lower RMSE which indicates that in real applications, 
the analysis is expected to be closer to reality. In addition to more accurate 
prior kernel, updating Bq will result in a better update of the mass matrix M 
which in turn is expected to result in better performance of the smoother. In our 
experiments the update has a small positive impact on the performance of the 
two data assimilation schemes as explained in Figures The positive effect 
here is explained by reduction in the RMS errors. The resulting ensemble-based 
forecast error covariance matrix that is used to update Bg can be crucial for cases 
where observations are sparse or not uniformly distributed over the grid, and 
therefore well worth the computational overhead of the forward propagation of 
all the analysis ensemble members to build the full forecast ensemble. Results of 
the data assimilation process with hybrid (updated) background error covariance 
matrix on the next two windows are shown in Figures and [T0| 
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(a) Reference solution at the 
initial time, H component 


(b) Reference solution at the 
initial time, U component 
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Figure 6: Four dimensional data assimilation results with linear observations. 
The initial condition solutions at the beginning of the first window are shown. 
The data assimilation scheme and the state components are indicated under each 
panel. The assimilation window length is 6 hours, with hourly observations. The 
background error covariance matrix Bq is kept fixed. 
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(a) Reference solution at the 
initial time, H component 



(b) Reference solution at the 
initial time, U component 



(c) Reference solution at the 
initial time, V component 
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(f) HMC smoother analysis 
at the initial time, V compo¬ 
nent 





(g) 4D-Var analysis at the 
initial time, H component 


(h) 4D-Var analysis at the 
initial time, U component 





(i) 4D-Var analysis at the ini¬ 
tial time, V component 


Figure 7: Four dimensional data assimilation results with linear observations. 
The initial condition solutions at the beginning of the second window are shown. 
The data assimilation scheme and the state components are indicated under each 
panel. The assimilation window length is 8 hours, with hourly observations. The 
background error covariance matrix Bq is not updated. 
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(a) Reference solution at the 
initial time, H component 



(b) Reference solution at the 
initial time, U component 



(c) Reference solution at the 
initial time, V component 
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Figure 8: Four dimensional data assimilation results with linear observations. 
The initial condition solutions at the beginning of the third window are shown. 
The data assimilation scheme and the state components are indicated under each 
panel. The assimilation window length is 8 hours, with hourly observations. The 
background error covariance matrix Bq is kept fixed. 
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Figure 9: Four dimensional data assimilation results with linear observations. 
The initial condition solutions at the beginning of the second window are shown. 
The data assimilation scheme and the state components are indicated under each 
panel. The assimilation window length is 8 hours, with hourly observations. The 
background error covariance matrix Bq is updated using (23) for both schemes. 




















































A. Attia and V. Rao and A. Sandu HMC sampling smoother for 4D-DA 24 



(a) Reference solution at the 
initial time, H component 



(b) Reference solution at the 
initial time, U component 



(c) Reference solution at the 
initial time, V component 
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Figure 10: Four dimensional data assimilation results with linear observations. 
The initial condition solutions at the beginning of the third window are shown. 
The data assimilation scheme and the state components are indicated under each 
panel. The assimilation window length is 8 hours, with hourly observations. The 
background error covariance matrix Bq is updated using (23) for both schemes. 

















































A. Attia and V. Rao and A. Sandu HMC sampling smoother for 4D-DA 25 


4.3 Computational considerations 


Both 4D-Var and HMC smoother require the same computational infrastruc¬ 
ture, namely, an adjoint model that computes the gradient of the 4D-Var cost 
functional (7b I. This gradient calculation is the computational bottleneck for 
both 4D-Var and HMC smoother. It requires forward propagation of the model, 
and a backward propagation of the adjoint model. In our shallow water test 
the cost of one adjoint model run is approximately equivalent to 2.5 times the 
cost of one forward model run. This makes the cost of gradient evaluation 
approximately equal to 3.5 the cost of a forward model run. 


Table 1: Data assimilation scheme cost for the shallow water model with linear 
observations. Number of function evaluations (forward model runs), and the 
number of optimization iterations (for 4D-Var) are listed. The cost of one adjoint 
run is approximately equal to 2.5 forward model runs in the current settings. 
Total cost is approximated in terms of number of forward model runs. The cost 
of the HMC sampling smoother is the same for the three assimilation windows. 


Data as¬ 
similation 
scheme 

Cost 

Assimilation window 


(2) 

m 

Fixed 

Bo 

Fixed 

Bo 

Hybrid 

Bo 

Fixed 

Bo 

Hybrid 

Bo 

4D-Var 

Function evaluations 

151 

97 

101 

96 

93 

Number of iterations 

49 

47 

46 

46 

45 

Cost in equivalent for¬ 
ward model runs 

322.5 

261.5 

262 

257 

250.5 

HMC 

smoother 

Number of proposed 
states 

530 

Cost per proposal 

4.5 

Cost in equivalent for¬ 
ward model runs 

2,385 


The cost of the 4D-Var depends on the number of iterations and function 
evaluations required by the optimization algorithm. On the first window the 
number of iterations required by the LBFGS optimizer is 49, with 151 function 
evaluations. The total cost of the 4D-Var solution is then 151-1-49 x 3.5 = 322.5 
equivalent forward model runs. 

The cost of the HMC smoother depends on the configuration of the chain: 
the number of burn-in step, the number of dropped states at stationarity, and 
step-size settings of the symplectic integrator. The symplectic integrator itself 
controls the number of adjoint runs to evaluate the gradient of the cost func¬ 
tional in order to propose a new state to the chain. The size of the desired 
ensemble controls the length of the Markov chain and consequently the total 
cost of the analysis step by the HMC smoother. The Verlet integrator (12), 
used in the current experiments, requires a single adjoint run to propose a new 
state to the chain. The acceptance/rejection criterion requires an additional 
forward run to evaluate the loss of energy. This makes the cost of generat¬ 
ing a proposal state to the Markov chain approximately equal to 4.5 the cost 
of a model run. On all assimilation windows the chosen ensemble size is 100. 
The number of burn-in states is 30, and 4 states are rejected between consec- 
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utive selected samples. The HMC sampling smoother, in this case generates 
30 + 100 X 5 = 530 states to collect the analysis ensemble, with a total cost 
roughly equal to 530 x 4.5 = 2, 385 forward model runs. 

The computational cost of the two data assimilation schemes, 4D-Var and 
HMC smoother, on each assimilation window are summarized in Table[2 Notice 
that the total cost of DA schemes is given in terms of the total number of 
forward model runs. On the first window the cost of the HMC smoother is 
approximately 9 times the cost of the 4D-Var scheme. On the next two windows, 
the HMC smoother costs roughly 11 times the cost of the 4D-Var. A cost- 
reduction in 4D-Var is expected because the forecast state is closer to the MAP 
than the case on the first window. The higher computational cost of the HMC 
smoother can be handled more efficiently by parallelizing the sampling scheme 
and the gradient calculations. Another way to reduce the computational cost 
of the HMC smoother is to replace the burn-in steps with a suboptimal 4D- 
Var obtained using a small number of iterations. The computational cost of 
the proposed sampling smoother can of course by be reduced by decreasing the 
ensemble size, however this will result in higher sampling error. The impact of 
the sampling errors can be assessed by the techniques developed in [2 [32] . 

The increased cost of the HMC smoother could be acceptable in view of 
the additional useful information it provides: a sample estimate of the analysis 
probability distribution (and as immediate consequences an analysis error co- 
variance matrix and a flow-dependent background error covariance matrix for 
the next cycle). 


5 Conclusion and Future Work 

A four-dimensional data assimilation smoother is proposed in this paper. The 
smoother samples from the posterior distribution using a Hybrid Monte-Carlo 
approach. The 4D-Var approach provides a MAP estimate of the true state, 
but it does not compute a measure of uncertainty of the analysis. The HMC 
smoother builds an ensemble approximating the posterior PDF. This can be 
used to estimate the true state together with the uncertainty in analysis, e.g., 
by calculating the ensemble mean and ensemble-based analysis error covariance 
matrix. Moreover, propagating the analysis ensemble to the beginning of the 
next assimilation window provides a forecast ensemble that can be used to 
construct a flow-dependent background covariance matrix for this new window. 
Unlike several popular hybrid approaches, the HMC smoother generates an 
analysis error covariance that is consistent with the analysis state - because 
both statistics are produced by one consistent data assimilation scheme. 

The HMC smoother requires an adjoint of the simulation model, and runs 
on the same computational infrastructure as 4D-Var. The computational cost 
of the HMC smoother is - as of now - larger than that of 4D-Var. The efficiency 
issue must be addressed before the HMC smoother becomes fully practical. 
We are currently investigating several strategies to enhance the performance of 
the sampling smoother and to reduce its computational cost. Parallelizing the 
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sampling smoother will be considered. We will also test the HMC smoother on 
the case of nonlinear observations, imperfect models, and non-Gaussian errors. 


Acknowledgements 

This work was partially supported by awards AFOSR FA9550-12-1-0293-DEF, 
NSF CCF-1218454, NSF DMS-1419003, and by the Computational Science 
Laboratory at Virginia Tech. 


References 

References 

[1] Exshall: A Turkel-Zwas explicit large time-step EORTRAN program for 
solving the shallow-water equations in spherical coordinates. Computers 
and Geosciences 17(9), 1311 - 1343 (1991) 

[2] A-posteriori error estimates for variational inverse problems. SIAM/ASA 
Journal on Uncertainty Quantification Submitted (2014) 

[3] An optimization framework to improve 4d-var data assimilation system 
performance. Journal of Computational Physics 275(0), 377 - 389 (2014) 

[4] An efficient implementation of the ensemble Kalman filter based on an it¬ 
erative ShermanMorrison formula. Statistics and Computing 25(3) (2015). 
DOI 10.1007/sll222-014-9454-4 

[5] Alexander, E.J., Eyink, G.L., Restrepo, J.M.: Accelerated monte carlo for 
optimal estimation of time series. Journal of Statistical Physics 119(5-6), 
1331-1345 (2005) 

[6] Anderson, J.: An ensemble adjustment Kalman filter for data assimilation. 
Monthly Weather Review 129, 2884-2903 (2001) 

[7] Attia, A., Sandu, A.: A hybrid Monte Carlo sampling filter for non-gaussian 
data assimilation. Quarterly Journal of the Royal Meteorological Society 
Submitted (2014) 

[8] Bennett, A.F., Chua, B.S.: Open-ocean modeling as an inverse problem: 
the primitive equations. Monthly weather review 122(6), 1326-1336 (1994) 

[9] Beskos, A., Pillai, N., Roberts, G., Sanz-Serna, J.M., Stuart, A.: Optimal 
tuning of the hybrid Monte Garlo algorithm. Bernoulli 19(5A), 1501-1534 
(2013) 

[10] Beskos, A., Pinski, E., Sanz-Serna, J.M., Stuart, A.: Hybrid Monte Carlo 
on Hilbert spaces. Stochastic Processes and their Applications 121(10), 
2201-2230 (2011) 



A. Attia and V. Rao and A. Sandu HMC sampling smoother for 4D-DA 28 


[11] Bishop, C., Etherton, B., Majumdar, S.: Adaptive sampling with the 
ensemble transform Kalman filter. Part I: Theoretical aspects. Monthly 
Weather Review 129 , 420-436 (2001) 

[12] Blanes, S., Casas, F., Sanz-Serna, J.M.: Numerical integrators for the hy¬ 
brid Monte Carlo method. arXiv preprint arXiv:1405.3153 (2014) 

[13] Briers, M., Doucet, A., Maskell, S.: Smoothing algorithms for state-space 
models. Annals of the Institute of Statistical Mathematics 62(1), 61-89 
( 2010 ) 

[14] Cheng, H., Jardak, M., Alexe, M., Sandu, A.: A hybrid approach to esti¬ 
mating error covariances in variational data assimilation. Tellus A 62(3), 
288-297 (2010) 

[15] D’Augustine, A., Sandu, A.: MATLODE: A MATLAB ODE solver and 
sensitivity analysis toolbox. ACM Transactions on Mathematical Software 
(TOMS) Submitted (2015) 

[16] Duane, S., Kennedy, A., B.J. Pendleton, J.B., Roweth, D.: Hybrid Monte 
Carlo. Physics Letters B 195(2), 216-222 (1987) 

[17] Dunlavy, D., Kolda, T., Acar, E.: Poblano vl. 0: A MATLAB toolbox 
for gradient-based optimization. Sandia National Laboratories, Tech. Rep. 
SAND2010-1422 (2010) 

[18] Earl, D.J., Deem, M.W.: Parallel tempering: Theory, applications, and 
new perspectives. Physical Chemistry Chemical Physics 7(23), 3910-3916 
(2005) 

[19] Evensen, G.: Sequential data assimilation with a nonlinear quasi- 
geostrophic model using Monte Carlo methods to forcast error statistics 
. Journal of Geophysical Research 99(C5), 10,143-10,162 (1994) 

[20] Evensen, G.: The ensemble Kalman filter: theoretical formulation and 
practical implementation. Ocean Dynamics 53 (2003) 

[21] Evensen, G., van Leeuwen, P.: Assimilation of Geosat altimeter data for the 
Agulhas Gurrent using the ensemble Kalman filter with a quasigeostrophic 
model . Monthly Weather Review 124 , 85-96 (1996) 

[22] Evensen, G., van Leeuwen, P.: An ensemble Kalman smoother for nonlinear 
dynamics . Monthly Weather Review 128 , 1852-1867 (2000) 

[23] G., E.: Data Assimilation: The ensemble Kalman filter. Springer, Berlin 
(2007) 

[24] Gustafsson, N., Bojarova, J.: Four-dimensional ensemble variational (4D- 
En-Var) data assimilation for the high resolution limited area model 
(HIRLAM). Nonlinear Processes in Geophysics 21(4), 745-762 (2014) 



A. Attia and V. Rao and A. Sandu HMC sampling smoother for 4D-DA 29 


[25] Hunt, B.R., Kalnay, E., Kostelich, E., Ott, E., patil, D., sauer, T., Szun- 
yogh, L, Yorke, J., Zimin, A.: Four-dimensional ensemble kalman filtering. 
Tellus 56(4), 273-277 (2004) 

[26] Law, Stuart, A.M.: Evaluating data assimilation algorithms. 

Monthly Weather Review 140(11), 3757-3782 (2012) 

[27] Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., Teller, E.: 
Equation of state calculations by fast computing machines. The Journal of 
Chemical Physics 21(6), 1087-1092 (1953) 

[28] Navon, I.M., De Villiers, R.: The application of the Turkel-Zwas explicit 
large time-step scheme to a hemispheric barotropic model with constraint 
restoration. Monthly weather review 115(5), 1036-1052 (1987) 

[29] Neal, R.: Probabilistic inference using Markov chain Monte Carlo methods. 
Department of Computer Science, University of Toronto Toronto, Ontario, 
Canada (1993) 

[30] Neal, R.: MCMC using Hamiltonian dynamics. Handbook of Markov chain 
Monte Carlo (2011) 

[31] Nino Ruiz, E., Sandu, A.: A derivative-free trust region framework for 
variational data assimilation. Computational and Applied Mathematics In 
print (2015) 

[32] Rao, V., Sandu, A.: A posteriori error estimates for dddas inference prob¬ 
lems. Procedia Computer Science 29, 1256-1265 (2014) 

[33] Ravela, S., McLaughlin, D.: Fast ensemble smoothing. Ocean Dynamics 
57(2), 123-134 (2007) 

[34] Sandu, A., Chai, T.: Chemical data assimilationan overview. Atmosphere 
2(3), 426-463 (2011) 

[35] Sandu, A., Cheng, H.: A subspace approach to data assimilation and 
new opportunities for hybridization. International Journal for Uncertainty 
Quantification In print (2015) 

[36] Sanz-Serna, J.: Markov chain Monte Carlo and numerical differential equa¬ 
tions. In: Current Challenges in Stability Issues for Numerical Differential 
Equations, pp. 39-88. Springer (2014) 

[37] Sanz-Serna, J., M-P.Calvo: Numerical Hamiltonian problems, vol. 7. Chap¬ 
man & Hall London (1994) 

[38] Sasaki, Y.: Some basic formalisms in numerical variational analysis. 
Monthly Weather Review 98(12), 875-883 (1970) 



A. Attia and V. Rao and A. Sandu HMC sampling smoother for 4D-DA 30 


[39] St-Cyr, A., Jablonowski, C., Dennis, J., Tufo, H., Thomas, S.: A com¬ 
parison of two shallow water models with nonconforming adaptive grids. 
Monthly Weather Review 136, 1898-1922 (2008) 

[40] Swendsen, R.H., Wang, J.S.: Replica Monte Carlo simulation of spin- 
glasses. Physical Review Letters 57(21), 2607 (1986) 

[41] Tremolet, Y.: Accounting for an imperfect model in 4D-Var. Quarterly 
Journal of the Royal Meteorological Society 132(621), 2483-2504 (2006) 

[42] Whitaker, J., Hamill, T.M.: Ensemble data assimilation without perturbed 
observations. Monthly Weather Review 130, 1913-1924 (2002) 



